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Abstract 

In this paper we investigate clustering in the weighted setting, in which every data point is assigned a real valued 
weight. We conduct a theoretical analysis on the influence of weighted data on standard clustering algorithms in each 
of the partitional and hierarchical settings, characterising the precise conditions under which such algorithms react 
to weights, and classifying clustering methods into three broad categories: weight-responsive, weight-considering, 
and weight-robust. Our analysis raises several interesting questions and can be directly mapped to the classical 
unweighted setting. 

1 Introduction 

We consider a natural generalisation of the classical clustering problem, where every data point is associated with a real 
valued weight. This generalisation enables more accurate representation of some clustering problems. For example, 
consider vector quantification that aims to find a compact encoding of signals that has low expected distortion. The 
accuracy of the encoding is most important for signals that occur frequently. With weighted data, such a consideration 
is easily captured by having the weights of the points represent signal frequency. Another illustration of the utility of 
weights comes from facility allocation, such as the placement of police stations in a new district. The distributions of 
the stations should enable quick access to most areas in the district. However, the accessibility of different institutions 
to a station may have varying importance. The weighted setting enables a convenient method for prioritising certain 
landmarks over others. 

In this paper, we analyse the behaviour of clustering algorithms on weighted data. Given a data set and a cluster- 
ing algorithm, we are interested in understanding how the resulting clustering changes depending on the underlying 
weights. We classify clustering algorithms into three categories: those that are affected by weights on all data sets, 
those that ignore weights, and those methods that respond to weights on some configurations of the data but not on 
others. Among the methods that always respond to weights are several well-known algorithms, such as fc-means and 
fc-median. On the other hand, algorithms such as single-linkage, complete-linkage, and min-diameter ignore weights. 

Perhaps the most notable is the last category of algorithms. We find that methods belonging to this category are 
robust to weights when data is sufficiently clusterable, and respond to weights otherwise. The average-linkage algo- 
rithm as well as the well-known spectral objective function, ratio cut, both fall within this category. We characterise 
the precise conditions under which these methods are influenced by weights. Our analysis also reveals the following 
interesting phenomenon: algorithms that are known to perform well in practice (in the classical, unweighted setting), 
tend to be more responsive to weights. For example, k-means is highly responsive to weights while single linkage, 
which often performs poorly in practice [?], is weight robust. 



2 Related Work 

Clustering algorithms are usually analysed in the context of unweighted data. The only related work that we are 
aware of is from the early 1970s. Fisher and Van Ness [?] introduce several properties of clustering algorithms. 
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Among these, they mention "point proportion admissibility", which requires that the output of an algorithm should not 
change if points are duplicated. They then observe that a few algorithms are point proportion admissible. However, 
clustering algorithms can display a much wider range of behaviours on weighted data than merely satisfying or failing 
to satisfy point proportion admissibility. We carry out a much more extensive analysis of clustering on weighted data, 
characterising the precise conditions under which algorithms respond to weight. 

In addition, Wright [?] proposes a formalisation of cluster analysis consisting of eleven axioms. In two of these 
axioms, the notion of mass is mentioned. Namely, that points with zero mass can be treated as non-existent, and that 
multiple points with mass at the same location are equivalent to one point whose weight is the sum of these masses. 
The idea of mass has not been developed beyond the statements of these axioms in their work. 

3 Background 

A weight function w over X is a function w : X ^ Given a domain set X, denote the corresponding weighted 
domain by thereby associating each element x E X with weight w{x). A distance function is a symmetric 

function d : X x X i?+ U {0}, such that d{x, y) = if and only if x ^ y. We consider weighted data sets of the 
form {■w[X],d), where X is some finite domain set, d is a distance function over X, and w is a weight function over 
X. 

A k-clustering C = {Ci ,€2, ■ ■ ■ ,Ck} of a domain set X is a partition of X into 1 < k < \X \ disjoint, non-empty 
subsets of X where UjCi = X. A clustering of is a fc-clustering for some 1 < fc < \X\. To avoid trivial partitions, 
clusterings that consist of a single cluster, or where every cluster has a unique element, are not permitted. 

Denote the weight of a cluster Ci G C by w{Ci) = J2x£C ^{^)- For a clustering C, let |C| denote the number 
of clusters in C. For x,y G X and clustering C of X, write x y if x and y belong to the same cluster in C and 
X T^c y, otherwise. 

A partitional clustering algorithm is a function that maps a data set (w[X], d) and an integer 1 < A: < |X| to a 
/c-clustering of X. A dendrogram 2? of X is a pair (T, M) where T is a binary rooted tree and M : leaves{T) X 
is a bijection. A hierarchical clustering algorithm is a function that maps a data set d) to a dendrogram of X. 

A set Co C X is a cluster in a dendrogram V = (T, M) of X if there exists a node a; in T so that Co — {M{y) \ 
y is a leaf and a descendent of x}. For a hierarchical algorithm A, A{w[X] , d) outputs a clustering C = {Ci , . . . , Cfe} 
if d is a cluster in d) for all 1 < i < fc. A partitional algorithm A outputs clustering C on d) if 

A{w[X],d,\C\) ^ C. 

Given a clustering algorithm A and a data set (X, d), range{A{X, d)) — {C \ 3w such that A outputs C on 
d)}, which is the set of clusterings that A outputs on {X, d) over all possible weight functions. 

4 Basic Categories 

Different clustering algorithms respond differently to weights. We introduce a formal categorisation of clustering 
algorithms based on their response to weights. First, we define what it means for a partitional algorithm to be weight 
responsive on a clustering. We present an analogous definition for hierarchical algorithms in Section|6l 

Definition 1 (Weight responsive). A partitional clustering algorithm A is weight-responsive on a clustering C of 
{X,d)if 

1. there exists a weight function w so that A{w[X], d) = C, and 

2. there exists a weight function w' so that A{w'[X], d) ^ C. 

Weight-sensitive algorithms are weight-responsive on all clusterings in their range. 

Definition 2 (Weight Sensitive). An algorithm A is weight-sensitive if for all {X, d) and all C G rangei^AiX, d)), A 
is weight-responsive on C. 

At the other extreme are clustering algorithms that do not respond to weights on any data set. This is the only 
category that has been considered in previous work, corresponding to "point proportion admissibility"[?]. 
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Definition 3 (Weight Robust). An algorithm A is weight-robust if for all {X, d) and all clusterings C of {X, d), A is 

not weight-responsive on C. 

Finally, there are algorithms that respond to weights on some clusterings, but not on others. 
Definition 4 (Weight Considering). An algorithm A is weight-considering if 

• There exists an {X, d) and a clustering C of{X, d) so that A is weight-responsive on C. 

• There exists an {X, d) and C G range{A{X, d)) so that A is not weight-responsive on C. 

To formulate clustering algorithms in the weighted setting, we consider their behaviour on data that allows for 
duplicates. Given a data set {X, d), elements x,y G X are duplicates if d{x, y) = and d{x, z) = d{y, z) for all 
z & X. In a Euclidean space, duplicates correspond to elements that occur at the same location. We obtain the 
weighted version of a data set by de-duplicating the data, and associating every element with a weight equaling the 
number of dupUcates of that element in the original data. The weighted version of an algorithm partitions the resulting 
weighted data in the same manner that the unweighted version partitions the original data. As shown throughout the 
paper, this translation leads to natural formulations of weighted algorithms. 



In this section, we show that partitional clustering algorithms respond to weights in a variety of ways. We show that 
many popular partitional clustering paradigms, including A;-means, fc-median, and min-sum, are weight sensitive. It is 
easy to see that methods such as min-diameter and fc-center are weight-robust. We begin by analysing the behaviour of 
a spectral objective function ratio cut, which exhibits interesting behaviour on weighted data by responding to weight 
unless data is highly structured. 

5.1 Ratio-Cut Spectral Clustering 

We investigate the behaviour of a spectral objective function, ratio-cut [?], on weighted data. Instead of a distance 
function, spectral clustering rehes on a similarity function, which maps pairs of domain elements to non-negative real 
numbers that represent how alike the elements are. 
The ratio-cut of a clustering C is 



The ratio-cut clustering function is rcut(w[X], s, k) = dj:gvDm.c-\c\=k rcut(C, s). We prove that this function 

ignores data weights only when the data satisfies a very strict notion of clusterabiUty. To characterise precisely when 
ratio-cut responds to weights, we first present a few definitions. 

A clustering C of {■w[X],s) is perfect if for all x\,X2,Xz,X4, € X where X\ ~c X2 and xz i^c xa, s(xi, S2) > 
s(x3, X4). C is separation-uniform if there exists A so that for all x,y G X where x 7^0 y, s{x, y) = A. Note that 
neither condition depends on the weight function. 

We show that whenever a data set has a clustering that is both perfect and separation-uniform, then ratio-cut 
uncovers that clustering, which implies that ratio-cut is not weight-sensitive. Note that these conditions are satisfied 
when all between-cluster similarities are set to 0. On the other hand, we show that ratio-cut does respond to weights 
when either condition fails. 

Lemma 1. Given a clustering C of {X, s) where every cluster has more than one point, ifC is not separation-uniform 
then ratio-cut is weight-responsive on C. 

Proof. We consider a few cases. 

Case 1: There is a pair of clusters with different similarities between them. Then there exist Ci, C2 € C, a; G Ci, 
and y e C2 so that s{x, y) > s{x, z) for all 2; e C2, and there exists o G C2 so that s{x, y) > s{x, a). 



5 Partitional Methods 



rcul 
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Let w be a weight function such that w{x) = W for some sufficiently large W and weight 1 is assigned to all other 
points in X. Since we can set W to be arbitrarily large, when looking at the cost of a cluster, it suffices to consider the 
dominant term in terms of W. We will show that we can improve the cost of C by moving a point from C2 to C\ . Note 
that moving a point from C2 to Ci does not affect the dominant term of clusters other than Ci and C2. Therefore, we 
consider the cost of these two clusters before and after rearranging points between these clusters. 

Let A = 'Yl,aeC2 '^(^' ^) ^'^'^ ™ ~ 1- Then the dominant term, in terms of W, of the cost of Ci is W^^, 
which comes from the cost of points in. The cost of C2 approaches a constant as ^ 00. 

Now consider clustering C" obtained from C by moving y from cluster C2 to cluster Ci. The dominant term in 
the cost of Ci becomes W^^—^^r^^ and the cost of C2 approaches a constant as ^ 00. By choice of x and y, if 

< 4i '•^^^ ^' lower loss than C when W is large enough. "^^^{^^ < ^ holds whenever A < s{x, y)m, 
and the latter holds by choice of x and y. 

Case 2: For every pair of clusters, the similarities between them are the same. However, there are clusters 
Ci, C2, C3 e C, so that the similarities between Ci and C2 are greater than the ones between Ci and C3. Let a 
denote the similarities between Ci and C2, and b the similarities between Ci and C3. 

Let a; e Ci. Let w be a weight function such that w(x) = W for large W, and weight 1 is assigned to all 
other points in X. The dominant term comes from clusters going into Ci , specifically edges that include point x. The 
dominant term of the contribution of cluster C3 is Wb and the dominant term of the contribution of C2 is Wa, totalling 
Wa + Wb. 

Now consider clustering C obtained from clustering C by merging Ci with C2, and splitting C3 into two clusters 
(arbitrarily). The dominant term of the clustering comes from clusters other than Ci U C2, and the cost of clusters 
outside Ci UC2 UC3 is unaffected. The dominant term of the cost of the two clusters obtained by splitting C3 is Wb for 
each, for a total of 2Wb. However, the factor of Wa that C2 previously contributed is no longer present. Therefore, we 
replace the coefficient of the dominant term from a to 6, which improved the cost of the clustering because b < a. □ 

Lemma 2. Given a clustering C of {X, s) where every cluster has more than one element, if C is not perfect than 
ratio-cut is weight-responsive on C. 

The proof for the above lemma is included in the appendix. 

Lemma 3. Given any data set , s) that has a perfect, separation-uniform k-clustering C, ratio-cut{w[X] ,s,k) = 
C. 

Proof. Let s) be a weighted data set, with a perfect, separation-uniform clustering C = {Ci, . . . ,Ck}. Recall 

that for any y C X, w{Y) = J2yeY "^(v)- Then, 



rcutfc;, w A , s) = - > — = 7: > W-^ — - — 

= 2 E — — = 2 E E -(y) = 2 E -(^^) = 2 Et- w - -(^0] 

1=1 ^-^d.iz'^t V / 1=1 y^Q^ 1=1 1=1 



= ^ ^kwix) - J2 y^ic.)^ ^^ik- l)w{X). 



Consider any other clustering, C = {C'l,. . . , C^} 7^ C. Since the perfect clustering is unique, there exists at 
least one pair x y such that s{x, y) > A. Since s{x, y) > A, for every x,y G X, the cost of C is, 

rcut(C ,w[X],s) = ' ^^"',.(.) > lEti \ = Hk-l)w{X) = 

rcut(C). So clustering C has a higher cost than C. □ 

We can now characterise the precise conditions under which ratio-cut responds to weights. Ratio-cut responds to 
weights on all data sets but those where cluster separation is both very large and highly uniform. Formally, 
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Theorem 1. Given a clustering C of {X, s) where every cluster has more than one element, ratio-cut is weight- 
responsive on C if and only if either C is not perfect, or C is not separation-uniform. 

Proof. The resuh follows by Lemma[T] Lemma|2] and Lemma|3] □ 

5.2 iT-Means 

Many popular partitional clustering paradigms, including fc-means, fc-median, and min-sum, are weight sensitive. 
Moreover, these algorithms satisfy a stronger condition. By modifying weights, we can make these algorithms separate 
any set of points. We call such algorithms weight-separable. 

Definition 5 (Weight Separable). A partitional clustering algorithm A is weight-separable if for any data set (X, d) 
and any S C X, where 2 < \S\ < k, there exists a weight function w so that x T^A(w[x].d,k) y for all disjoint pairs 
x,y e S. 

Note that every weight-separable algorithm is also weight-responsive. 
Lemma 4. If a clustering algorithm A is weight-separable, then A is weight-responsive. 

Proof. Given any {w[X],d), let C — A{w[X],d,k). Select points x and y where x U- Since A is weight- 
separable, there exists w' so that x '/'A{w'lx],d,k) and so A{w'[X], d, k) ^ C. □ 

X-means is perhaps the most popular clustering objective function, with cost 
fc-means ( C, d) = 



Y.x,y<idd{x,yf ■w{x) -wjy) 
h -(^^) 

where w{Ci) = X^^gc ^(^^fl The fc-means algorithm outputs a clustering with minimal fc-means cost. We show 
that fc-means is weight-separable, and thus also weight-sensitive. 

Tlieorem 2. K-means is weight-separable. 

Proof. Consider any C X. Let w be a weight function over X where w{x) = W if x S, for large W, and w{x) — 
1 otherwise. Let mi = T[\\iVx,yex d{x, yY > 0, m2 = maxx^y^x d{x, y)"^, and n ~ \X\. Consider any fc-clustering 
C where all the elements in S belong to distinct clusters. Then fc-means(C, d) < km2{n + ^). On the other 

hand, given any fc-clustering C" where at least two elements of S appear in the same cluster, k-means{C' , w[X], d) > 

■^rp^. Since lim^^^oo 1" mra"rK (c 'w[x] d) ~ ^-nieans separates all the elements in S for large enough W. □ 

The following result holds using a similar argument. 

Tlieorem 3. Min-sum, which minimises the objective function Xc gc Xx ygc '^(^' ' ^i^) ' ^iv)' weight- 
separable. 

It can also be shown that a few other algorithms similar to fc-means, namely fc-median and fc-mediods are also 
weight-separable. The details appear in the appendix. Observe that all of these popular objective functions are highly 
responsive to weight. 



'Note that this formulation is equivalent to the common formulation that rehes on centers of mass [?], however that formulation appHes only 
over normed vector spaces. 
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6 Hierarchical Algorithms 



Similarly to partitional methods, hierarchical algorithms also exhibit a wide range of responses to weights. We show 
that Ward's method, a successful linkage-based algorithm, as well as popular divisive heirarchical methods, are weight 
sensitive. On the other hand, it is easy to see that the linkage-based algorithms single-linkage and complete-linkage 
are both weight robust, as was observed in [?]. 

Average-linkage, another popular linkage-based method, exhibits more nuanced behaviour on weighted data. 
When a clustering satisfies a reasonable notion of clusterability, then average-linkage detects that clustering irre- 
spective of weights. On the other hand, this algorithm responds to weights on all other clusterings. We note that the 
notion of clusterability required for average-linkage is a lot weaker than the notion discussed in Section ISTl where it 
is used to characterise the behaviour of ratio-cut on weighted data. 

Hierarchical algorithms output dendrograms, which contain multiple clusterings. Please see the preliminary sec- 
tion for definitions relating to the hierarchical setting. Weight-responsive for hierarchical algorithms is defined analo- 
gously to Definition [T] 

Definition 6 (Weight responsive). A clustering algorithm A is weight-responsive on a clustering C of {X, d) if (1) 
there exists a weight function w so that A{w[X],d) outputs C, and (2) there exists a weight function w' so that 
A(w' [X] , d) does not output C. 

Weight-sensitive, weight-considering, and weight-robust are defined as for partitional algorithms in Section|4l with 
the above definition for weight-responsive. 



6.1 Average Linkage 

Linkage-based algorithms start off by placing every element in its own cluster, and proceed by repeatedly merging the 
"closest" pair of clusters until the entire dendrogram is constructed. To identify the closest clusters, these algorithms 
use a linkage function that maps pairs of clusters to a real number. Formally, a linkage function is a function £ : 
{{Xi,X2, d,w) I d,w owerXiUX2} ->■ R+. 

Average-linkage is one of the most popular linkage-based algorithms (commonly applied in bioinformatics under 
the name UPGMA). Recall that 'w{X) — J^xex w{x). The average-linkage linkage function is 

iAL{Xi,X2,d,w) = — — . 

w(Xi) ■ W(X2) 

To study how average-linkage responds to weights, we present a relaxation of the notion of a perfect clustering. 

Definition 7 (Nice). A clustering C of {w[X], d) is nice if for all xi,X2,X3 G X where x\ ^i <^nd xi '^c Xd,> 
d{xi,X2) < d(xi,X3). 

Data sets with nice clusterings correspond to those that satisfy the "strict separation" property introduced by Balcan 
et al. [?]. As for a perfect clustering, being a nice clustering is independent of weights. 

We present a complete characterisation of the way that average-linkage (AL) responds to weights, showing that it 
ignores weights on nice clusterings, but responds to weights on all other clusterings. 

Tfieorem 4. For any data set {X,d) and clustering C G range{AL{X,d)), average-linkage is weight robust on 
clustering C if and only if C is a nice clustering. 

The proof of Theorem|4]follows from the two lemmas below. 

Lemma 5. If a clustering C — {Ci , . . . , Cfe} o/ (X, d) is not nice, then either C ^ range{AL{X, d)) or average- 
linkage is weight- responsive on C. 

Proof. Assume that there exists some w so that C G AL{w[X],d). If it does not exist then we are done. We construct 
w' fiolhutC ^ AL{w'[X],d). 
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Since C is not nice, there exist 1 < i,j < k, i ^ j, and xi,X2 G Cj, xi ^ X2, and X3 € Cj, so that 

d{Xi,X2) > d{Xi,X3). 

Now, define weigh function w' as follows: = 1 for all a; € X \ {xi, X2}, and w'{xi) = w'{x2) = W, for 

some large value W. We argue that when W is sufficiently large, C is not a clustering in AL{w' [X] , d). 

By way of contradiction, assume that C is a clustering in AL{w'[X],d) for any weight function w' . Then there is 
a step in the algorithm where clusters Xi and X2 merge, where Xi,X2 C Cj, x\ e Xi, and X2 S X2. At this point, 
there is some cluster X-^ C Cj so that x-^ G X3. 

We compare £^^(^1, X2, d, «;') and Ia^Xi^X^, d, w'). £AL{Xi,X2,d, w') = for some 

non-negative real a^s. Similarly, ^^^^(Xi, X3, d, u>') = ^ '^w^+plvv+P4^^^ ^"^n^ non-negative real /?iS. 

Dividing both sides by W^, we see that £AL{Xi,X3,d,w') d{xi,X3) and £AL{Xi,X2,d,w') — >■ (i(a:;i,X2) 
as — >^ 00, and so the result holds since rf(a;i, X3) < d{xi,X2)- Therefore average Unkage merges Xi with X3, so 
cluster Cj is never formed, and so C is not a clustering in AL {w'[X],d). □ 

Finally, average-Unkage outputs all nice clusterings present in a data set, regardless of weights. 

Lemma 6. Given any weighted data set {w[X], d), ifC is a nice clustering of {X, d), then C is in the dendrogram 

produced by average-Unkage on {w[X] , d). 

Proof. Consider a nice clustering C = {Ci, . . . , C^} over {w[X],d). It suffices to show that for any 1 < « < j < fc, 
Xi,X2 C d where Xi n X2 = and X3 C Q, £al(Xi, ^2, d, w) < £al(Xi, ^3, d, w). We have the following 
inequaUties: 

tAL{X^,X2,d,w) < -^-^ ^iX^)MX2) 

_ E»2eX2 ^(^2) E^igxi [M'(a:i)-max^2eX2 rf(xi,a:2)] _ ExigXi t«(a:i) max^2eX2 ^(3:1,3:2) 

and 



T^xiexi^i^i) ■ ^^^x^exs d{xi,X3)J2xseX3w(.^3) _ ExieJfi ^(^^i) ■ ^^^x^ex^ d{xi,X3) 



w{Xi) ■ wiX3) w{Xi) 
Since C is nice, miua^ggxa c^(a;i, 2:3) > maxa;2ex2 t^(a;i,a;2), andso^AL(^i,^3) > £al{Xi,X2). □ 

6.2 Ward's Method 

Ward's method is a highly effective clustering algorithm [?], which, at every step, merges the clusters that will yield 
the minimal increase to the k-means cost. Let ctr{X, d, w) be the center of mass of the data set {w[X],d). Then, the 
Unkage function for Ward's method is 

^ J ^ w{Xi)-w{X2)-d{ctr{Xi,d,w),ctr{X2,d,w)f 

lWard(yi-l,yi-2,a,W) = zr—— 

W{Xl) +W{X2) 

Theorem 5. Ward's method is weight sensitive. 
The proof is included in the appendix. 

6.3 Divisive Algorithms 

The class of divisive clustering algorithms is a well-known family of hierarchical algorithms, which construct the den- 
drogram by using a top-down approach. This family of algorithms includes the popular bisecting k-means algorithm. 
We show that a class of algorithms that includes bisecting k-means consists of weight-sensitive methods. 
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Given a node x in dendrogram (T, Af), let C{x) denote the cluster represented by node x. Formally, C{x) = 
{M [y] I y is a leaf and a descendent of x}. Informally, a 'P-Divisive algorithm is a hierarchical clustering algorithm 
that uses a partitional clustering algorithm V to recursively divide the data set into two clusters until only single 
elements remain. Formally, 

Definition 8 CP-Divisive). A hierarchical clustering algorithm A is 'P-Divisive with respect to a partitional clustering 
algorithm V, if for all (X, d), we have A{w \X\ , d) = (T, M), such that for all non-leaf nodes x in T with children xi 

and X2, P{w[C{x)],d,2) = {C{xi),C{x2)}- 

We obtain bisecting fc-means by setting P to fc-means. Other natural choices for P include min-sum, and exemplar- 
based algorithms such as fc-median. As shown in Section|5] many of these partitional algorithms are weight-separable. 
We show that whenever P is weight-separable, then P-Divisive is weight-sensitive. The proof of the following theorem 
appears in the appendix. 

Tlieorem 6. //'P is weight-separable then the V-Divisive algorithm is weight-sensitive. 





Partitional 


Hierarchical 


Weiglit 
Sensitive 


fc-means, fc-medoids 
fc-median, Min-sum 


Ward's method 
Bisecting fc-means 


Weight 
Considering 


Ratio-cut 


Average-linkage 


Weight 
Robust 


Min-diameter 
fc-center 


Single-linkage 
Complete-linkage 



Table 1 : A classification of clustering algorithms based on their response to weighted data. 



7 Discussion and Future Work 

In this paper we investigated several classical algorithms, belonging to each of the partitional and hierarchical settings, 
and characterised the exact conditions under which they respond to weights. Our results are summarised in Table [T] 
We note that all of our results immediately translate to the standard setting, by mapping each point with integer weight 
to the same number of unweighted duplicates. 

In particular, we proved precisely when the weight considering methods, average-linkage and ratio-cut, respond 
to weights. It is interesting to note that the response of these weight considering techniques is substantially different. 
Ratio cut ignores weights only on data that is exceptionally well-structured, having large and highly uniform cluster 
separation. Yet average linkage requires a much weaker condition, finding all clusterings where data are closer to other 
elements in their partition than to data outside their cluster Intuitively, average linkage uses weights as a secondary 
source of information, relying on them only when the clustering structure is ambiguous. 

There are a number of interesting avenues for future investigation. A compelling question left open is to understand 
the correlation between the weight responsiveness of an algorithm and the quality of clusterings that it produces in 
the classical setting. As an example, observe that many notable algorithms, such as fc-means and spectral methods, 
respond to weights, while less used approaches, such as single linkage, never do. It would also be interesting to 
perform a quantitative analysis to measure the exact degree of responsiveness to weights, which may lead to a more 
fine grained classification of these algorithms. In addition, it remains to be determined how the approximations used 
in practice, such as spectral clustering heuristics and the Lloyd method, behave on weighted data. Our preliminary 
work on these heuristics lends further support to the hypothesis that the more commonly applied algorithms are also 
more responsive to weights. 
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